Complete Hierarchical Cut-Clustering: An Analysis of Guarantee and Quality

نویسندگان

  • Michael Hamann
  • Tanja Hartmann
  • Betreuerin Tanja Hartmann
چکیده

There are many algorithms for dividing a graph into parts, so-called clusters. An essential question is how dense these clusters are. This can be measured by the intra-cluster expansion. The cut-clustering algorithm as presented by Flake et al. [FTT04] provides a theoretical guarantee on the intra-cluster expansion, which for example greedy clustering approaches can not give, as calculating the intra-cluster expansion of a cluster is NP-hard. This guarantee depends on a parameter value. A sequence of parameter values yields a clustering hierarchy. In the rst part of this work we will present two algorithms for nding di erent clusterings. In particular the second approach, which we have developed, does guarantee that all possible clusterings are found and that the intervals of the parameter values for which a certain clustering is returned by the algorithm are exact. This is possible with not more than twice as many executions of the original cut-clustering algorithm as there are di erent clusterings in the hierarchy. In the second part of this work we examine the hierarchies that are de ned by these clusterings and also compare the cut clusterings to clusterings calculated by a greedy algorithm based on modularity, a popular measure for the quality of clusterings. For most of the graphs in our test set of 304 graphs, the guarantee that the cut-clustering algorithm gives was better than a trivial lower bound of the intra-cluster expansion. We show that there is a tendency that the clusterings of the cut-clustering algorithm have a higher intra-cluster expansion and that these clusters are not, like the clusters of the modularity algorithm, of almost equal size but do have very di erent sizes. Some of the cut clusterings do still have a modularity value that almost reaches the modularity value of the modularity clusterings. Deutsche Zusammenfassung Der Cut-Cluster-Algorithmus ist ein von Flake et al. [FTT04] vorgestellter Algorithmus zur Clusterung von gewichteten, ungerichteten Graphen, d.h. zur Aufteilung von Graphen in knoteninduzierte Teilgraphen, die im Vergleich zu den Kanten zwischen den Clustern moglichst dicht sind. Eine Moglichkeit, diese Dichte zu messen, ist die sogenannte IntraCluster Expansion, die de niert ist durch das Minimum der Gewichte aller Schnitte in dem jeweiligen Subgraphen geteilt durch die kleinere Schnittseite. Da es NP-schwer ist, die Intra-Cluster Expansion eines Clusters zu berechnen, ist der Cut-Cluster-Algorithmus besonders interessant: Er gibt durch einen Parameter-Wert eine untere Schranke f ur die Intra-Cluster Expansion der zu berechnenden Clusterung an. Die Clusterungen f ur verschiedene Parameterwerte sind ineinander geschachtelt und bilden deshalb eine Hierarchie. In dieser Arbeit werden im ersten Teil zwei Algorithmen vorgestellt, die es ermoglichen, Intervalle von Parameterwerten zu ermitteln, f ur die der Cut-Cluster-Algorithmus jeweils eine andere Clusterung berechnet. Das erste vorgestellte Verfahren basiert auf binarer Suche, der zweite, in dieser Arbeit neu entwickelte Ansatz nutzt Erkenntnisse aus einem parametrischen maximalen s-t-Fluss-Algorithmus und kann garantieren, dass alle moglichen Clusterungen mit exakten Parameter-Intervallen gefunden werden. Hierf ur m ussen lediglich maximal doppelt so viele Clusterungen berechnet werden, wie tatsachlich unterschiedliche Clusterungen existieren. In einem zweiten Teil wird untersucht, wie gut die Garantie, die der Cut-Cluster-Algorithmus liefert, in der Praxis verglichen mit einer trivialen unteren Schranke f ur die Intra-Cluster Expansion abschneidet. F ur die untersuchten 304 Graphen wurde dabei festgestellt, dass die Garantie in den meisten Fallen deutlich besser ist als die triviale untere Schranke.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Complete hierarchical cut-clustering: A case study on expansion and modularity

In this work we study the hierarchical cut-clustering approach introduced by Flake et al., which is based on minimum s-t-cuts. The resulting cut-clusterings stand out due to strong connections inside the clusters, which indicate a clear membership of the vertices to the clusters. The algorithm uses a parameter which controls the coarseness of the resulting partition and which can be used to con...

متن کامل

Fully-Dynamic Hierarchical Graph Clustering Using Cut Trees

Algorithms or target functions for graph clustering rarely admit quality guarantees or optimal results in general. However, a hierarchical clustering algorithm by Flake et al., which is based on minimum s-t-cuts whose sink sides are of minimum size, yields such a provable guarantee. We introduce a new degree of freedom to this method by allowing arbitrary minimum s-t-cuts and show that this unr...

متن کامل

Determination of the Best Hierarchical Clustering Method for Regional Analysis of Base Flow Index in Kerman Province Catchments

The lack of complete coverage of hydrological data forces hydrologists to use the homogenization methods in regional analysis. In this research, in order to choose the best Hierarchical clustering method for regional analysis, base flow and related index were extracted from daily stream flow data using two parameter recursive digital filters in 43 hydrometric stations of the Kerman province. Ph...

متن کامل

Complete Hierarchical Cut-Clustering: A Case Study on Modularity and Expansion?,??

We present a simple and efficient method for constructing a cutclustering hierarchy as introduced by Flake et al. Cut-clusterings excel by a clearly indicated membership of the vertices to the clusters due to strong connections inside the clusters compared to only weak connections outside. Their coarseness depends on a parameter that provides a quality guarantee in terms of expansion, which is ...

متن کامل

Dynamic Graph Clustering Using Minimum-Cut Trees

Algorithms or target functions for graph clustering rarely admit quality guarantees or optimal results in general. Based on properties of minimum-cut trees, a clustering algorithm by Flake et al. does however yield such a provable guarantee, which ensures the quality of bottlenecks within the clustering. We show that the structure of minimum s-t-cuts in a graph allows for an efficient dynamic u...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011